Locality-Improved FFT Implementation on a Graphics Processor
نویسندگان
چکیده
The growing computational power of modern graphics processing units is making them very suitable for general purpose computing. These commodity processors operate generally as parallel SIMD platforms and, among other factors, the effectiveness of the codes is subject to a right exploitation of the underlying memory hierarchy. This paper deals with the implementation of the Fast Fourier Transform on a novel graphics architecture offered recently by NVIDIA. Such an implementation takes into consideration memory reference locality issues, that are crucial when pursuing a high degree of parallelism, that is, a good occupancy of the processing elements. The proposed implementation has been tested and compared to the manufacturer’s own implementation. Key–Words: Fast Fourier Transform (FFT), Graphics Processing Unit (GPU), memory locality, parallel processing
منابع مشابه
Memory Locality Exploitation Strategies for FFT on the CUDA Architecture
Modern graphics processing units (GPU) are becoming more and more suitable for general purpose computing due to its growing computational power. These commodity processors follow, in general, a parallel SIMD execution model whose efficiency is subject to a right exploitation of the explicit memory hierarchy, among other factors. In this paper we analyze the implementation of the Fast Fourier Tr...
متن کاملMassively Parallel FFT Algorithm for the NVIDIA Tesla GPU
The emergence of streaming multicore processors with multi-SIMD architectures opens unprecedented opportunities for executing many sophisticated signal processing algorithms, including FFTs, faster and within a much lower energy budget. We report on the development, implementation, and demonstration of a novel, massively parallel computational scheme for the FFT that exploits the capabilities o...
متن کاملDesign and Implementation of Pipelined Floating Point Fast Fourier Transform Processor
There are several methodologies and techniques that already offer hardware and software solutions for computing Fast Fourier Transform (FFT), which have advantages for specific applications. These solutions are developed for running in several platforms, such as GPU, DSP, FPGA and ASIC and they are usually described in C/C++ language or HDL. Implementation intended for reconfigurable logic is u...
متن کاملComputing discrete transforms on the Cell Broadband Engine
Discrete transforms are of primary importance and fundamental kernels in many computationally intensive scientific applications. In this paper, we investigate the performance of two such algorithms; Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT), on the Sony–Toshiba–IBM Cell Broadband Engine (Cell/B.E.), a heterogeneous multicore chip architected for intensive gaming applicat...
متن کاملThe nonequispaced FFT on graphics processing units
Without doubt, the fast Fourier transform (FFT) belongs to the algorithms with large impact on science and engineering. By appropriate approximations, this scheme has been generalized for arbitrary spatial sampling points. This so called nonequispaced FFT is the core of the sequential NFFT3 library and we discuss its computational costs in detail. On the other hand, programmable graphics proces...
متن کامل